Exploring Semantic Information in Hindi WordNet for Hindi Dependency Parsing

نویسندگان

  • Sambhav Jain
  • Naman Jain
  • Aniruddha Tammewar
  • Riyaz Ahmad Bhat
  • Dipti Misra Sharma
چکیده

In this paper, we present our efforts towards incorporating external knowledge from Hindi WordNet to aid dependency parsing. We conduct parsing experiments on Hindi, an Indo-Aryan language, utilizing the information from concept ontologies available in Hindi WordNet to complement the morpho-syntactic information already available. The work is driven by the insight that concept ontologies capture a specific real world aspect of lexical items, which is quite distinct and unlikely to be deduced from morpho-syntactic information such as morph, POS-tag and chunk. This complementing information is encoded as an additional feature for data driven parsing and experiments are conducted. We perform experiments over datasets of different sizes. We achieve an improvement of 1.1% (LAS) when training on 1,000 sentences and 0.2% (LAS) on 13,371 sentences over the baseline. The improvements are statistically significant at p<0.01. The higher improvements on 1,000 sentences suggest that the semantic information could address the data sparsity problem.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring the effects of Sentence Simplification on Hindi to English Machine Translation System

Even though, a lot of research has already been done on Machine Translation, translating complex sentences has been a stumbling block in the process. To improve the performance of machine translation on complex sentences, simplifying the sentences becomes imperative. In this paper, we present a rule based approach to address this problem by simplifying complex sentences in Hindi into multiple s...

متن کامل

An Insight into Role of Wordnet and Language Network for effective IR from Hindi Text Documents

This paper investigates the limitations of traditional Information Retrieval (IR) models and how the semantic based approaches overcomes these limitations. Further the paper analyzes a range of aspects of language network representation of text corpus and how different network properties can lead to improve the results for different applications of IR. The paper analyzes Hindi Wordnet to exploi...

متن کامل

Exploring self training for Hindi dependency parsing

In this paper we explore the effect of selftraining on Hindi dependency parsing. We consider a state-of-the-art Hindi dependency parser and apply self-training by using a large raw corpus. We consider two types of raw corpus, one from same domain as of training and testing data and the other from different domain. We also do an experiment, where we add small gold-standard data to the training s...

متن کامل

Exploring Self-training and Co-training for Dependency Parsing

We explore the effect of self-training and co-training on Hindi dependency parsing. We use Malt parser, which is a state-ofthe-art Hindi dependency parser, and apply self-training using a large unannotated corpus. For co-training, we use MST parser with comparable accuracy to the Malt parser. Experiments are performed using two types of raw corpora— one from the same domain as the test data and...

متن کامل

Expansion of the First Hindi-Nepali Word-Net Based Bi-Lingual Dictionary and the advancement of the Human-Machine Interface

Natural Language Processing is introducing a new era in the field of Computer Science and Machine translation. HumanMachine interaction is to play a very important role in the coming centuries as the dependency of human over the machine is increasing variably. Word-Net was first introduced by Miller and Fellbaum in 1985. WordNet is a Lexical database for the Human Languages. It groups the Human...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013